PCAs based on genome composition for 3539 coronavirus whole genome sequences. PCAs are colour-coded and ellipses drawn based on different outcome variables, though underlying PCA for each bias type is the same. Mouseover gives outcome variable and virus name.
As for spikes, strong separation, but scattered betacovs. Outlying cetacean viruses on left??
As for spikes, tight clusters of individual viruses.
Stronger separation of genera than spikes, likely because stop codons are not so strongly weighted. Alphacovs still seem to prefer TGA, but not so clear for other genera.
Epidemic coronaviruses fairly distant from endemic coronaviruses here, unlike spikes. Not as strong a pattern of preference for stop codons for all human coronaviruses either (unlike spikes). Not worth looking at without stop codons as they’re not overly influential here.
However if excluding stop codons, similar pattern emerges to spikes when examining PC3/4, even though they explain small amount of variance, good separation on human vs nonhuman viruses. Real signal or just noise…??
Unlike spikes, very strong separation when PCA accounts for whole genome.
As for spikes, reasonable separation between epidemic coronaviruses and endemic coronaviruses!